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METHOD AND APPARATUS FOR CONTROL OF RECEIVE DATA 
Background of the Invention 

The invention relates generally to network data 
processing. 

Networking products such as routers require high 
speed components for packet data movement, i.e., collecting 
packet data from incoming network device ports and queuing 
the packet data for transfer to appropriate forwarding 
device ports. - They also require high-speed special 
controllers for processing the packet data, that is, parsing 
the data and -making forwarding decisions. Because the 
implementation of these high-speed functions usually 
involves the development of ASIC or custom devices, such 
networking products are of limited flexibility. For 
example, each controller is assigned to service network 
packets from for one or more given ports on a permanent 
basis . 

Summary of the Invention 



In one aspect of the invention, receiving data from 
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a network includes issuing a receive request directing the 
transfer of data from one of the plurality of device ports 
to a buffer memory and specifying a thread from among a 
plurality of processing threads to process the data. 

Brief Description of the Drawings 

Other features and advantages of the invention will 
be apparent from the following description taken together 
with the drawings in which: 

FIG. 1 is a block diagram of a communication system 
employing a hardwares-based multi-threaded processor; 

FIG. 2 is a block diagram of a microengine employed 
in the hardware-based multi-threaded processor of FIG. 1; 

FIG. 3 is an illustration of an exemplary thread 
task assignment; 

FIG. 4 is a block diagram of an I/O bus interface 
shown in FIG. 1; 

FIG. 5 is a detailed diagram of a bus interface unit 
employed by the I/O bus interface of FIG. 4; 

FIGS. 6A-6F are illustrations of various bus 
configuration control and status registers (CSRs) ; 
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FIG. 7A is a detailed diagram illustrating the 
interconnection between a plurality of 10/100 Ethernet 
("slow") ports and the bus interface unit; 

FIG. 7B is a detailed diagram illustrating the 
interconnection between two Gigabit Ethernet ("fast") ports 
and the bus interface unit; 

FIGS. 8A-8C are illustrations of the formats of the 

RCV RDY_CTL, RCV_RDY_HI and RCV_RDY_LO CSR registers, 

respectively; 

FIG. 9 is a depiction of the receive threads and 
their interaction with the I/O bus interface during a 
receive process; 

FIGS. 10A and 10B are illustrations of the format of 
the RCV_REQ FIFO and the RCV_CTL FIFO, respectively; and 

FIG. 11 is an illustration of the thread done 
registers . 

Detailed Description 



Referring to FIG. 1, a communication system 10 
includes a parallel, hardware-based multi-threaded processor 
12. The hardware based multi- threaded processor 12 is 
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coupled to a first peripheral bus (shown as a PCI bus) 14, a 
second peripheral bus referred to as an I/O bus 16 and a 
memory system 18. The system 10 is especially useful for 
tasks that can be broken into parallel subtasks or 
functions. The hardware-based mult i -threaded processor 12 
includes multiple microengines 22, each with multiple 
hardware controlled program threads that can be 
simultaneously active and independently work on a task. In 
the embodiment shown, there are six microengines 22a-22f and 
each of the six microengines is capable of processing four 
program threads, as will be described more fully below. 

The hardware -based mult i -threaded processor 12 also 
includes a processor 23 that assists in loading microcode 
control for other resources of the hardware-based multi- 
threaded processor 12 and performs other general purpose 
computer type functions such as handling protocols, 
exceptions, extra support for packet processing where the 
microengines pass the packets off for more detailed 
processing. In one embodiment, the processor 23 is a 
StrongARM (ARM is a trademark of ARM Limited, United 
Kingdom) core based architecture. The processor (or core) 
23 has an operating system through which the processor 23 
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can call functions to operate on the microengines 22a-22f. 
The processor 23 can use any supported operating system, 
preferably real-time operating system. For the core 
processor implemented as a StrongARM architecture, operating 
5 systems such as MicrosoftNT real-time, VXWorks and :CUS, a 

freeware operating system available over the Internet, can 
be used. 

•fas* 

j= The six microengines 22a-22f each operate with 

|qi shared resources including the memory system 18, a PCI bus 

IlQl interface 24 and an I/O bus interface 28. The PCI bus 

s interface provides an interface to the PCI bus 14. The I/O 

m bus interface 28 is responsible for controlling and 

Q interfacing the processor 12 to the I/O bus 16. The memory 

y system 18 includes a Synchronous Dynamic Random Access 

15 Memory (SDRAM) 18a, which is accessed via an SDRAM 

controller 26a, a Static Random Access Memory (SRAM) 18b, 
which is accessed using an SRAM controller 26b, and a 
nonvolatile memory (shown as a FlashROM) 18c that is used 
for boot operations. The SDRAM 16a and SDRAM controller 26a 
20 are typically used for processing large volumes of data, 

e.g., processing of payloads from network packets. The SRAM 
18b and SRAM controller 26b are used in a networking 

5 
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implementation for low latency, fast access tasks, e.g., 
accessing look-up tables, memory for the processor 23, and 
so forth. The microengines 22a-22f can execute memory 
reference instructions to either the SDRAM controller 26a or 
the SRAM controller 18b. 

The hardware -based multi-threaded processor 12 
interfaces to network devices such as a media access 
controller device, including a "slow" device 30 (e.g., 
10/100BaseT Ethernet MAC) and/or a u fast" device 31, such as 
Gigabit Ethernet MAC, ATM device or the like, over the I/O 
Bus 16. In the embodiment shown, the slow device 30 is an 
10/100 BaseT Octal MAC device and thus includes 8 slow ports 
32a-32h, and the fast device is a Dual Gigabit MAC device 
having two f fast ports 33a, 33b. Each of the network devices 
attached to the I/O Bus 16 can include a plurality of ports 
to be serviced by the processor 12. Other devices, such as 
a host computer (not shown) , that may be coupled to the PCI 
bus 14 are also serviced by the processor .12. In general, 
as a network processor, the processor 12 can interface to 
any type of communication device or interface that 
receives/sends large amounts of data. The processor 12 
functioning as a network processor could receive units of 



PATENT 

ATTORNEY DOCKET NO: 10559/1 3700 1/P7876 



packet data from the devices 30, 31 and process those units 
of packet data in a parallel manner, as will be described. 
The unit of packet data could include an entire network 
packet (e.g., Ethernet packet) or a portion of such a 
packet . 



coupled to one or more internal buses. The internal buses 
include an internal core bus 34 (labeled "AMBA" ) for 
coupling the processor 23 to the memory controllers 26a, 26b 
and to an AMBA translator 36. The processor 12 also 
includes a private bus 3 8 that couples the microengines 22a- 
22f to the SRAM controller 26b, AMBA translator 36 and the 
Fbus interface 28. A memory bus 40 couples the memory 
controllers 26a, 26b to the bus interfaces 24, 28 and the 
memory system 18 . 



microengines 22a-22f is shown. The microengine 22a includes 
a control store 70 for storing a microprogram. The 
microprogram is loadable by the central processor 20. The 
microengine 70 also includes control logic 72 . The control 
logic 72 includes an instruction decoder 73 and program 
counter units 72a- 72d. The four program counters are 



Each of the functional units of the processor 12 are 
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maintained in hardware. The microengine 22a also includes 
context event switching logic 74 . The context event 
switching logic 74 receives messages (e.g., 
SEQ_# JEVENT_RESPONSE ; FBI_EVENT_RESPONSE ; 
SRAM_EVENT_RESPONSE; SDRAM_EVENT_RESPONSE ; and 
AMBA_EVENT_RESPONSE) from each one of the share resources, 
e.g., SRAM 26b, SDRAM 26a, or processor core 20, control and 
status registers, and so forth. These messages provides 
information on whether a requested function has completed. 
Based on whether or not the function requested by a thread 
has completed and signaled completion, the thread needs to 
wait for that complete signal, and if the thread is enabled 
to operate, then the thread is place on an available thread 
list (not shown) . As earlier mentioned, the microengine 22a 
can have a maximum of 4 threads of execution available. 

In addition to event signals that are local to an 
executing thread, the microengine employs signaling states 
that are global. With signaling states, an executing thread 
can broadcast a signal state to all microengines 22 . Any 
and all threads in the microengines can branch on these 
signaling states. These signaling states can be used to 
determine availability of a resource or whether a resource 

8 
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is due for servicing. 

The context event logic 74 has arbitration for the 
four threads. In one embodiment, the arbitration is a round 
robin mechanism. However, other arbitration techniques, 
such as priority queuing or weighted fair queuing, could be 
used. The microengine 22a also includes and execution box 
(EBOX) data path 76 that includes an arithmetic logic unit 
(ALU) 76a and a general purpose register (GPR) set 76b. The 
ALU 76a performs arithmetic and logical functions as well as 
shift functions. 

The microengine 22a further includes a write 
transfer register file 78 and a read transfer register file 
80. The write transfer register file 78 stores data to be 
written to a resource. The read transfer register file 80 
is for storing return data from a resource. Subsequent to 
or concurrent with the data arrival, an event signal from 
the respective shared resource, e.g., memory controllers 
26a, 26b, or core 23, will be provided to the context event 
arbiter 74, which in turn alerts the thread that the data is 
available or has been sent. Both transfer register files 
78, 80 are connected to the EBOX 76 through a data path. In 
the described implementation, each of the register files 

9 
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includes 64 registers. 

The functionality of the microengine threads is 
determined by microcode loaded (via the core processor) for 
a particular user's application into each microengine 7 s 
5 control store 70. Referring to FIG. 3, an exemplary thread 

task assignment 90 is shown. Typically, one of the 
microengine threads is assigned to serve as a receive 
O- scheduler 92 and another as a transmit scheduler 94. A 

*« plurality of threads are configured as receive processing 

lbj threads 96 and transmit processing (or "fill") threads 98. 

!L i c 

H Other thread task assignments include a transmit arbiter 100 

B and one or more core communication threads 102. Once 

launched, a thread performs its function independently. 
^ The receive scheduler thread 92 assigns packets to 

l f 5^ receive processing threads 96. In a packet forwarding 

application for a bridge/router, for example, the receive 
processing thread parses packet headers and performs lookups 
based in the packet header information. Once the receive 
processing thread or threads 96 has processed the packet, it 
20 either sends the packet as an exception to be further 

processed by the core 23 (e.g., the forwarding information 
cannot be located in lookup and the core processor must 

10 
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learn it) , or stores the packet in the SDRAM and queues the 
packet in a transmit queue by placing a packet link 
descriptor for it in a transmit queue associated with the 
transmit (forwarding port) indicated by the header/ lookup . 

5 The transmit queue is stored in the SRAM. The transmit 

arbiter thread 100 prioritizes the transmit queues and the 

« transmit scheduler thread 94 assigns packets to transmit 

processing threads that send the packet out onto the 

\J ■ 

|n forwarding port indicated by the header/lookup information 

ff| 

T. ' S 

Iph during the receive processing. 

B The receive processing threads 96 may be dedicated 

ir ™ 

jfy to servicing particular ports or may be assigned to ports 

C3 dynamically by the receive scheduler thread 92. For certain 

si 

%2 system configurations, a dedicated assignment may be 

15 desirable. For example, if the number of ports is equal to 

the number of receive processing threads 96, then it may be 
quite practical as well as efficient to assign the receive 
processing threads to ports in a one-to-one, dedicated 
assignment. In other system configurations, a dynamic 
2 0 assignment may provide a more efficient use of system 

resources . 

The receive scheduler thread 92 maintains scheduling 

11 
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information 104 in the GPRs 76b of the microengine within 
which it executes. The scheduling information 104 includes 
thread capabilities information 106, port-to-thread 
assignments (list) 108 and "thread busy" tracking 
information 110. At minimum, the thread capabilities 
information informs the receive scheduler thread as to the 
type of .tasks for which the other threads are configured, 
e.g., which threads serve as receive processing threads. 
Additionally, it may inform the receive scheduler of other 
capabilities that may be appropriate to the servicing of a 
particular port. For instance, a receive processing thread 
may be configured to support a certain protocol, or a 
particular port or ports. A current list of the ports to 
which active receive processing threads have been assigned 
by the receive scheduler thread is maintained in the thread- 
to-port assignments list 108. The thread busy mask register 
110 indicates which threads are actively servicing a port. 
The receive scheduler uses all of this scheduling 
information in selecting threads to be assigned to ports 
that require service for available packet data, as will be 
described in further detail below. 

Referring to FIG. 4, the I/O bus interface 28 

12 
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includes shared resources 120, which are coupled to a 
push/pull engine interface 122 and a bus interface unit 124. 
The bus interface unit 124 includes a ready bus controller 
126 connected to a ready bus 128 and an Fbus controller 130 
5 for connecting to a portion of the I/O bus referred to as an 

Fbus 132. Collectively, the ready bus 128 and the Fbus 132 
O make up the signals of the I/O bus 16 (FIG. 1) . The 

=« resources 120 include two FIFOs, a transmit FIFO 134 and a 

^ receive FIFO 136, as well as CSRs 138, a scratchpad memory 

lkt 140 and a hash unit 142. The Fbus 132 transfers data 

5 between the ports of the devices 30, 31 and the I/O bus 

interface 28. The ready bus 128 is an 8-bit bus that 
'if performs several functions. It is used to read control 

w information' about data availability from the devices 30, 31, 

15 e.g., in the form of ready status flags. It also provides 

flow control information to the devices 30, 31, and may be 
used to communicate with another network processor 12 that 
is connected to the Fbus 132. Both buses 128, 132 are 
accessed by the microengines 22 through the CSRs 138. The 
20 CSRs 138 are used for bus configuration, for accessing the 

bus interface unit 124, and for inter- thread signaling. 
They also include a several counters and thread status 

13 
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registers, as will be described. The CSRs 138 are accessed 
by the microengines 22 and the core 23. The receive FIFO 
(RFIFO) 136 includes data buffers for holding data received 
from the Fbus 132 and is read by the microengines 22. The 
transmit FIFO (TFIFO) 134 includes data buffers that hold 
data to be transmitted to the Fbus 132 and is written by the 
microengines 22. The scatchpad memory 140 is accessed by 
the core 23 and microengines 22, and supports a variety of 
operations, including read and write operations, as well as 
bit test, bit test/clear and increment operations. The hash 
unit 142 generates hash indexes for 48 -bit or 64 -bit data 
and is accessed by the microengines 22 during lookup 
operations . 

The processors 23 and 22 issue commands to the 
push/pull engine interface 122 when accessing one of the 
resources 120. The push/pull engine interface 122 places 
the commands into queues (not shown) , arbitrates which 
commands to service, and moves data between the resources 
120, the core 23 and the microengines 22. In addition to 
servicing requests from the core 23 and microengines 22, the 
push/pull engines 122 also service requests from the ready 
bus 128 to transfer control information to a register in the 

14 
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microengine read transfer registers 80. 

When a thread issues a request to a resource 120, a 
command is driven onto an internal command bus 150 and 
placed in queues within the push/pull engine interface 122. 
Receive/read-related instructions (such as instructions for 
reading the CSRS) are written to a "push" command queue. 

The CSRs 13 8 include the following types of 
registers: Fbus receive and transmit registers; Fbus and 
ready bus configuration registers; ready bus control 
registers; hash unit configuration registers; interrupt 
registers; and several miscellaneous registers, including a 
thread status registers. Those of the registers which 
pertain to the receive process will be described in further 
detail . 

The interrupt/signal registers include an 
INTER_THD_S I G register for inter- thread signaling. Any 
thread within the microengines 22 or the core 23 can write a 
thread number to this register to signal an inter-thread 
event . 

Further details of the Fbus controller 13 0 and the 
ready bus controller 126 are shown in FIG. 5. The ready bus 
controller 126 includes a programmable sequencer 16 0 for 

15 
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retrieving MAC device status information from the MAC 
devices 30, 31, and asserting flow control to the MAC 
devices over the ready bus 12 8 via ready bus interface logic 
161. The Fbus controller 130 includes Fbus interface logic 

5 162, which is used to transfer data to and from the devices 

30, 31, is controlled by a transmit state machine (TSM) 164 

n and a receive state machine (RSM) 166. In the embodiment 

herein, the Fbus 132 may be configured as a bidirectional 

in 64 -bit bus, or two dedicated 32 -bit buses. In the 

m 

lgk unidirectional, 32 -bit configuration, each of the state 

s machines owns its own 32 -bit bus. In the bidirectional 

ry configuration, the ownership of the bus is established 

a ! : 

£3 through arbitration. Accordingly, the Fbus controller 13 0 

y further includes a bus arbiter 168 for selecting which state 

15 machine owns the Fbus 132. 

Some of the relevant CSRs used to program and 
control the ready bus 128 and Fbus 132 for receive processes 
are shown in FIGS. 6A-6F. Referring to FIG. 6A, 
RDYBUS_TEMPLATE_PROGx registers 170 are used to store 
2 0 . instructions for the ready bus sequencer. Each register of 

these 32-bit registers 170a, 170b, 170c, includes four, 8- 
bit instruction fields 172. Referring to FIG. 6B, a 

16 
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RCV_RDY_CTL register 174 specifies the behavior of the 
receive state machine 166. The format is as follows: a 
reserved field (bits 31:15) 174a; a fast port mode field 
(bits 14:13) 174b, which specifies the fast (Gigabit) port 
5 thread mode, as will be described; an auto push prevent 

window field (bits 12:10) 174c for specifying the autopush 
prevent window used by the ready bus sequencer to prevent 
the receive scheduler from accessing its read transfer 
Ifi registers when an autopush operation (which pushes 

lpi information to those registers) is about to begin; an 

a autopush enable (bit 9) 174d, used to enable autopush of the 

jfjj receive ready flags; another reserved field (bit 8) 174e; an 

p autopush destination field (bits 7:6) 174f for specifying an 

\j autopush operation's destination register; a signal thread 

15 enable field (bit 5) 174g which, when set, indicates the 

thread to be signaled after an autopush operation; and a 
receive scheduler thread ID (bits 4:0) 174h, which specifies 
the ID of the microengine thread that has been configured as 
a receive scheduler. 
20 Referring to FIG. 6C, a REC_FASTPORT_CTL register 

176 is relevant to receiving packet data from fast ports 
(fast port mode) only. It enables receive threads to view 

17 
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the current assignment of header and body thread assignments 
for the two fast ports, as will be described. It includes 
the following fields: a reserved field (bits 31:20) 176a; 
an F P 2 _HDR_THD_ I D field (bits 19:15) 176b, which specifies 
the fast port 2 header receive (processing) thread ID; an 
F P 2 _B OD Y_THD_ I D field (bits 14:10) 176c for specifying the 
fast port 2 body receive processing thread ID; an 
FP 1_HDR__THD__I D field (bits 9:5) 176d for specifying the fast 
port 1 header receive processing thread ID; and an 
FP1_B0DY_THD_ID field (bits 4:0) 176e for specifying the 
fast port 1 body processing thread ID. The manner in which 
these fields are used by the RSM 166 will be described in 
detail later. 

Although not depicted in detail, other bus registers 
include the following: a RDYBUS_TEMPLATE_CTL register 178 
(FIG. 6D) , which maintains the control information for the 
ready bus and the Fbus controllers, for example, it enables 
the ready bus sequencer; a RDYBUS_SYNCH_COUNT_DE FAULT 
register 180 (FIG. 6E) , which specifies the program cycle 
rate of the ready bus sequencer; and an FP__FASTPORT_CTL 
register 182 (FIG. 6F) , which specifies how many Fbus clock 
cycles the RSM 166 must wait between the last data transfer 

18 
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and the next sampling of fast receive status, as will be 
described. 

Referring to FIG. 7A, the MAC device 3 0 provides 
transmit status flags 200 and receive status flags 202 that 
5 indicate whether the amount of data in an associated 

transmit FIFO 204 or receive FIFO 206 has reached a certain 
pBa threshold level. The ready bus sequencer 160 periodically 

"2 polls the ready flags (after selecting either the receive 

% . I 

ready flags 202 or the transmit ready flags 200 via a flag 

in 

lft select 208) and places them into appropriate ones of the 

•f ~- 

CSRs 13 8 by transferring the flag data over ready bus data 
h,i lines 209. In this embodiment, the ready bus includes 8 

data lines for transferring flag data from each port to the 

■OK? 

- 3 

Fbus interface unit 124. The CSRs in which the flag data 
15 are written are defined as RCV_RDY_HI/LO registers 210 for 

receive ready flags and XMIT_RDY_HI/LO registers 212 for 
transmit ready flags, if the ready bus sequencer 16 0 is 
programmed to execute receive and transmit ready flag read 
instructions , respectively . 
2 0 When the ready bus sequencer is programmed with an 

appropriate instruction directing it to interrogate MAC 
receive ready flags, it reads the receive ready flags from 

19 
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the MAC device or devices specified in the instruction and 
places the flags into RCV_RDY_HI register 210a and a 
RCV_RDY_LO register 210b, collectively, RCV_RDY registers 
210. Each bit in these registers corresponds to a different 
device port on the I/O bus. 

Also, and as shown in FIG. 7B, the bus interface 
unit 124 also supports two fast port receive ready flag pins 
FAST_RX1 214a and FAST_RX2 214b for* the two fast ports of 
the fast MAC device 31. These fast port receive ready flag 
pins are read by the RSM 166 directly and placed into an 
RCV_RDY_CNT register 216. 

The RCV_RDY_CNT register 216 is one of several used 
by the receive scheduler to determine how to issue a receive 
request. It also indicates whether a flow control request 
is issued. 

'Referring to FIG. 8A, the format of the RCV_RDY_CNT 
register 216 is as follows: bits 31:28 are defined as a 
reserved field 216a; bit 27 is defined as a ready bus master 
field 216b and is used to indicate whether the ready bus 12 8 
is configured as a master or slave; a field corresponding to 
bit 26 216c provides flow control information; bits 25 and 
24 correspond to FRDY2 field 216d and FRDY1 field 216e, 

20 
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respectively. The FRDY2 216d and FRDY1 216e are used to 
store the values of the FAST_RX2 pin 214b and FAST_RX1 pin 
214a, respectively, both of which are sampled by the RSM 166 
each Fbus clock cycle; bits 23:16 correspond to a reserved 
field 216f; a receive request count field (bits 15:8) 216g 
specifies a receive request count, which is incremented 
after the RSM 166 completes a receive request and data is 
available in the RFIFO 136; a receive ready count field 
(bits 7:0) 216h specifies a receive ready count, an 8-bit 
counter that is incremented each time the ready bus 
sequencer 160 writes the ready bus registers RCV_RDY_CNT 
register 216, the RCV_RDY_LO register 210b and RCV_RDY_HI 
register 210a to the receive scheduler read transfer 
registers . 

There are two techniques for reading the ready bus 
registers: "autopush" and polling. The autopush instruction 
may be executed by the ready bus sequencer 160 during a 
receive process (rxautopush) or a transmit process 
(txautopush) . Polling requires that a microengine thread 
periodically issue read references to the I/O bus interface 
28. 

The rxautopush operation performs several functions. 

21 
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It increments the receive ready count in the RCV_RDY_CNT 
register 216. If enabled by the RCV_RDY_CTL register 174, 
it automatically writes the RCV_RDY_CNT 216, the RCV_RDY_LO 
and RCV_RDY_HI registers 210b, 210a to the receive scheduler 
read transfer registers and signals to the receive scheduler 
thread 92 (via a context event signal) when the rxautopush 
operation is complete. 

The ready bus sequencer 160 polls the MAC FIFO 
status flags, periodically and asynchronously to other events 
occurring in the processor 12. Ideally, the rate at which 
the MAC FIFO ready flags are polled is greater than the 
maximum rate at which the data is arriving at the MAC ports. 
Thus, it is necessary for the receive scheduler thread 92 to 
determine whether the MAC FIFO ready flags read by the ready 
bus sequencer 160 are new, or whether they have been read 
already. The rxautopush instruction increments the receive 
ready count in the RCV_RDY_CNT register 216 each time the 
instruction executes. The RCV_RDY_CNT register 216 can be 
used by the receive scheduler thread 92 to determine whether 
the state of specific flags have to be evaluated or whether 
they can be ignored because receive requests have been 
issued and the port is currently being serviced. For 

22 
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example, if the FIFO threshold for a Gigabit Ethernet port 
is set so that the receive ready flags are asserted when 64 
bytes of data are in the MAC receive FIFO 206 , then the 
state of the flags does not change until the next 64 bytes 

5 arrive 5120 ns later. If the ready bus sequencer 160 is 

programmed to collect the flags four times each 5120 ns 
period, the next three sets of ready flags that are to be 

\j collected by the ready bus sequence 160 can be ignored. 

SS3 

When the receive ready count is used to monitor the 

in 

ll'"* freshness of the receive ready flags, there is a possibility 

that the receive ready flags will be ignored when they are 

H providing new status. For a more accurate determination of 

fij 

W ready flag freshness, the receive request count may be used. 

*f Each time a receive request is completed and the receive 

a i 
- ~* 

15 control information is pushed onto the RCV_CNTL register 

232, the the RSM 166 increments the receive request count. 
The count is recorded in the RCV__RDY_CNT register the first 
time the ready bus sequencer executes an rxrdy instruction 
for each program loop. The receive scheduler thread 92 can 

20 use this count to track how many requests the receive state 

machine has completed. As the receive scheduler thread 
issues commands, it can maintain a list of the receive 

23 
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requests it submits and the ports associated with each such 
request . 

Referring to FIGS. 8B and 8C, the registers 
RCV_RDY_HI 210a and RCV_RDY_LO 210b have a flag bit 217a, 
217b, respectively, corresponding to each port. 

Referring to FIG. 9, the receive scheduler thread 92 
performs its tasks as quickly as possible to ensure that the 
RSM 166 is always busy, that is, that there is always a 
receive request waiting to be processed by the RSM 166. 
Several tasks performed by the receive scheduler 92 are as 
follows. The receive scheduler 92 determines which ports 
need to be serviced by reading the RCV_RDY_HI, RCV_RDY_LO 
and RCV_RDY_CNT registers 210a, 210b and 216, respectively. 
The receive scheduler 92 also determines which receive ready 
flags are new and which are old using either the receive 
request count or the receive ready count in the RCV_RDY_CNT 
register, as described above. It tracks the thread 
processing status of the other microengine threads by 
reading thread done status CSRs 24 0. The receive scheduler 
thread 92 initiates transfers across the Fbus 132 via the 
ready bus, while the receive state machine 166 performs the 
actual read transfer on the Fbus 132. The receive scheduler 
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92 interfaces to the receive state machine 166 through two 
FBI CSRs 138: an RCV_REQ register 230 and an RCV_CNTL 
register 232. The RCV_REQ register 230 instructs the 
receive state machine on how to receive data from the Fbus 
132. 

Still referring to FIG. 9, a process of initiating 
an Fbus receive transfer is shown. Having received ready 
status information from the RCV_RDY__HI/LO registers 210a, 
210b as well as thread availability from the thread done 
register 240 (transaction "1", as indicated by the arrow 
labeled 1) , the receive scheduler thread 92 determines if 
there is room in the RCV_REQ FIFO 23 0 for another receive 
request. If it determines that RCV_REQ FIFO 230 has room to 
receive a request, the receive scheduler thread 92 writes a 
receive request by pushing data into the RCV_REQ FIFO 230 
(transaction 2) . The RSM 166 processes the request in the 
RCV_REQ FIFO 23 0 (transaction 3) . The RSM 166 responds to 
the request by moving the requested data into the RFIFO 13 6 
(transaction 4) , writing associated control information to 
the RCV_CTL FIFO 232 (transaction 5) and generating a 
start_receive signal event to the receive processing thread 
96 specified in the receive request, (transaction 6) . The 
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RFIFO 136 includes 16 elements 241, each element for storing 
a 64 byte segment of data referred to herein as a MAC packet 
("MPKT") . The RSM 166 reads packets from the MAC ports in 
fragments equal in size to one or two RFIFO elements, that 
5 is, MPKTs . The specified receive processing thread 96 

responds to the signal event by reading the control 
information from the RCV_CTL register 232 (transaction 7) . 

if"* 

%J It uses the control information to determine, among other 

% 4 pieces of information, where the data is located in the 

lin RFIFO 136. The receive processing thread 96 reads the data 

~5 from the RFIFO 136 on quadword boundaries into its read 

is 

H transfer registers or moves the data directly into the SDRAM 

a h? 

W (transaction 8) . 

^ ST 

\J The RCV_REQ register 230 is used to initiate a 

b. *** 

15 receive transfer on the Fbus and is mapped to a two-entry 

FIFO that is written by the microengines . The I/O bus 
interface provides signals (not shown) to the receive 
scheduler thread indicating that the RCV_REQ FIFO 23 6 has 
room available for . another receive request and that the last 

2 0 issued receive request has been stored in the RCV_REQ 

register 230. 

Referring to FIG. 10A, the RCV_REQ FIFO 23 0 includes 
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two entries 231. The format of each entry 231 is as 
follows. The first two bits correspond to a reserved field 
230a. Bit 29 is an FA field 230b for specifying the maximum 
number of Fbus accesses to be performed for this request. A 
THSG field (bits 28:27) 230c is a two-bit thread message 
field that allows the scheduler thread to pass a message to 
the assigned receive thread through the ready state machine, 
which copies this message to the RCV_CNTL register. An SL 
field 230d (bit 26) is used in cases where status 
information is transferred following the EOP MPKT. It 
indicates whether two or one 32 -bit bus accesses are 
required in a 32-bit Fbus configuration. An El field 230e 
(bits 21:18) and an E2 field (bits 25:22) 230f specify the 
RFIFO element to receive the transferred data. If only 1 
MPKT is received, it is placed in the element indicated by 
the El field. If two MPKTs are received, then the second 
MPKT is placed in the RFIFO element indicated by the E2 
field. An FS field (bits 17:16) 230g specifies use of a 
fast or slow port mode, that is, whether the request is 
directed to a fast or slow port. The fast port mode setting 
signifies to the RSM that a sequence number is to be 
associated with the request and that it will be handling 
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speculative requests, which will be discussed in further 
detail later. An NFE field (bit 15) 230h specifies the 
number of RFIFO elements to be filled (i.e., one or two 
elements) . The IGFR field (bit 13) 230i is used only if 
fast port mode is selected and indicates to the RSM that it 
should process the request regardless of the status of the 
fast ready flag pins. An SIGRS field (bit 11) 230 j , if set, 
indicates that the receive scheduler be signaled upon 
completion of the receive request. A TID field (bits 10:6) 
230k specifies the receive thread to be notified or signaled 
after the receive request is processed. Therefore, if bit 
11 is set, the RCV_REQ entry must be read twice, once by the 
receive thread and once by the receive scheduler thread, 
before it can be removed from the RCV_REQ FIFO. An RM field 
(bits 5:3) 2301 specified the ID of the MAC device that has 
been selected by the receive scheduler. Lastly, an RP field 
(bits 2:0) 230m specifies which port of the MAC device 
specified in the RM field 2301 has been selected. 

The RSM 166 reads the RCV_REQ register entry 231 to 
determine how it should receive data from the Fbus 132, that 
is, how the signaling should be performed on the Fbus; where 
the data should be placed in the RFIFO and which microengine 
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thread should be signaled once the data is received. The 
RSM 166 looks for a valid receive request in the RCV_REQ 
FIFO 230. It selects the MAC device identified in the RM 
field and selects the specified port within the MAC by 
5 asserting the appropriate control signals. It then begins 

receiving data from the MAC device on the Fbus data lines. 
3 ~ The receive state machine always attempts to read either 

n eight or nine quadwords of data from the MAC device on the 

I n 

I'd Fbus as specified in the receive request. If the MAC device 

10« asserts the EOP signal, the RSM 166 terminates the receive 

i & early (before eight or nine accesses are made) . The RSM 166 

III 

s\! calculates the total bytes received for each receive request 

*I and reports the value in the REC_CNTL register 232. If EOP 

is received, the RSM 166 determines the number of valid 
15 bytes in the last received data cycle. 

The RCV_CNTL register 232 is mapped to a four-entry 
FIFO (referred to herein as RCV_CNTL_FIFO 232) that is 
written by the receive state machine and read by the 
microengine thread. The I/O bus interface 28 signals the 
20 assigned thread when a valid entry reaches the top of the 

RCV_CNTL FIFO. When a microengine thread reads the RCV_CNTL 
register, the data is popped off the FIFO. If the SIGRS 
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field 230i is set in the RCV_REQ register 230, the receive 
scheduler thread 92 specified in the RCV_CNTL register 232 
is signaled in addition to the thread specified in TID field 
23 0k. In this case, the data in the RCV_CNTL register 232 
is read twice before the receive request data is retired 
from the RCV_CTL FIFO 232 and the next thread is signaled. 
The receive state machine writes to the RCV_CTL register 232 
as long as the FIFO is not full. If the RCV_CTL FIFO 232 is 
full, the receive state machine stalls and stops accepting 
any more receive requests. 

Referring to FIG. 10B, the RCV_CNTL FIFO 232 
provides instruction to the signaled thread (i.e., the 
thread specified in TID) to process the data. As indicated 
above, the RCV_CNTL FIFO includes 4 entries 233. The format 
of the RCV_CNTL FIFO entry 233 is as follows: a THMSG field 
(31:30) 23a includes the 2-bit message copied by the RSM 
from REC_REQ register [28 : 27] . A MACPORT/THD field (bits 
29:24) 232b specifies either the MAC port number or a 
receive thread ID, as will be described in further detail 
below. An SOP SEQ field (23:20) 232c is used for fast ports 
and indicates a packet sequence number as an SOP (start-of- 
packet) sequence number if the SOP was asserted during the 
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receive data transfer and indicates an MPKT sequence number 
if SOP was not so asserted. An RF field 232d and RERR field 
232e (bits 19 and 18, respectively) both convey receive 
error information. An SE field 232f (17:14) and an FE field 
232g (13:10) are copies of the E2 and El fields, 
respectively, of the REC_REQ. An EF field (bit 9) 232h 
specifies the number of RFIFO elements which were filled by 
the receive request. An SN field (bit 8) 232i is used for 
fast ports and indicates whether the sequence number 
specified in SOP_SEQ field 232c is associated with fast port 
1 or fast port 2. A VLD BYTES field (7:2) 232 j specifies 
the number of valid bytes in the RFIFO element if the 
element contains in EOP MPKT. An EOP field (bit 1) 232k 
indicates that the MPKT is an EOP MPKT. An SOP field (bit 
0) 2321 indicates that the MPKT is an SOP MPKT. 

FIG. 11 illustrates the format of the thread done 
registers 240 and their interaction with the receive 
scheduler and processing threads 92, 96, respectively, of 
the microengines 22. The thread done registers 240 include 
a first thread status register, TH_DONE_REG 0 24 0a, which has 
2-bit status fields 241a corresponding to each of threads 0 
through 15. A second thread status register, TH_DONE_REG 1 
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240b, has 2-bit status fields 241b corresponding to each of 
threads 16 through 23 . These registers can be read and 
written to by the threads using a CSR instruction (or fast 
write instruction, described below) . The receive scheduler 

5 thread can use these registers to determine which RFIFO 

elements are not in use. Since it is the receive scheduler 

□ thread 92 that assigns receive processing threads 96 to 

process the data in the RFIFO elements, and it also knows 

%j 

IT the thread processing status from the THRE AD_DONE_REG 0 and 

fn 

1©^ THREAD DONE REG1 registers 240a, 240b, it can determine 

TO 

s which RFIFO elements are currently available. 

2 i 

fU The THREAD_DONE CSRs 240 support a two-bit message 

W 

O for each microengine thread. The assigned receive thread 

%3 may write a two-bit message to this register to indicate 

15 that it has completed its task. Each time a message is 

written to the THREAD_DONE register, the current message is 
logically ORed with the new message. The bit values in the 
THREAD_DONE registers are cleared by writing a w l", so the 
scheduler may clear the messages by writing the data read 

20 back to the THREAD_DONE register. The definition of the 2- 

bit status field is determined in software. An example of 
four message types is illustrated in TABLE 1 below. 
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2-BIT 
MESSAGE 


DEFINITION 


00 


Busy. 


01 


Idle, processing complete. 


10 


Not busy, but waiting to finish 
processing of entire packet. 


11 


Idle, processing complete for an EOP 
MPKT. 



TABLE 1 

'b asr 

%'3 

1G : ~ The assigned receive processing threads write their status 

JLj to the THREAD_DONE register whenever the status changes. 

f 2 For example, a thread may immediately write 00 to the 

7 THREAD DONE register after the receive state machine signals 

M — 

!4 the assigned thread. When the receive scheduler thread 

iy 

l§i reads the THREAD_DONE register, it can look at the returned 

value to determine the status of each thread and then update 
its thread/port assignment list. 

The microengine supports a fast_wr instruction that 
improves performance when writing to a subset of CSR 

20 registers. The fast_wr instruction does not use the push or 

pull engines. Rather, it uses logic that services the 
instruction as soon as the write request is issued to the 
FBI CSR. The instruction thus eliminates the need for the 
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pull engine to read data from a microengine transfer 



register when it processes the command. The meaning of the 



10-bit immediate data for some of the CSRs is shown below. 



5 



CSR 


10 -BIT IMMEDIATE DATA 


INTER_THD_SIG 


Thread number of the thread that 
is to be signaled. 


THREAD_DONE 


A 2 -bit message that is shifted 
into a position relative to the 
thread that is writing the 
message . 


THREAD DONE INCR1 
THREAD_DONE_INCR2 


Same as THREAD_DONE except that 
either the enqueue_seql or 
enqueue_seq2 is also incremented. 


INCR ENQ NUM1 
INCR_ENQ_NUM2 


Write a one to increment the 
enqueue sequence number by one. 



Q TABLE 2 

15 

It will be appreciated that the receive process as 
described herein assumes that no packet exemptions occurred, 
that is, that the threads are able to handle the packet 
processing without assistance from the core processor. 
20 Further, the receive process as described also assumes the 

availability of FIFO space. It will be appreciated that the 
various state machines must determine if there is room 
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available in a FIFO, e.g., the RFIFO, prior to writing new 
entries to that FIFO. If a particular FIFO is full, the 
state machine will wait until the appropriate number of 
entries has been retired from that FIFO. 

5 Additions, subtractions, and other modifications of 

the preferred embodiments of the invention will be apparent 
to those practiced in this field and are within the scope of 

E3 the following claims. 

f n 

H. _s_ 

it y 

vs. 
3 it; 
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